Policy-Gradient for Robust Planning

نویسندگان

Olivier Buffet

Douglas Aberdeen

چکیده

Real-world Decision-Theoretic Planning (DTP) is a very challenging research field. A common approach is to model such problems as Markov Decision Problems (MDP) and use dynamic programming techniques. Yet, two major difficulties arise: 1dynamic programming does not scale with the number of tasks, and 2the probabilistic model may be uncertain, leading to the choice of unsafe policies. We build here on Policy Gradient algorithms to address the first difficulty and on robust decision-making to address the second one through algorithms that train competing learning agents. The first agent learns the plan while the second learns the model most likely to upset the plan. It is known from gradient-based game theory that at least one player may not converge, so we focus on convergence of the robust plan only, using non-symmetric algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Motion Planning for Hopping Rovers on Small Solar System Bodies

Hopping rovers have emerged as a promising platform for the future surface exploration of small Solar System bodies, such as asteroids and comets. However, hopping dynamics are governed by nonlinear gravity fields and stochastic bouncing on highly irregular surfaces, which pose several challenges for traditional motion planning methods. This paper presents the first ever discussion of motion pl...

متن کامل

A Two-Teams Approach for Robust Probabilistic Temporal Planning

Large real-world Probabilistic Temporal Planning (PTP) is a very challenging research field. A common approach is to model such problems as Markov Decision Problems (MDP) and use dynamic programming techniques. Yet, two major difficulties arise: 1dynamic programming does not scale with the number of tasks, and 2the probabilistic model may be uncertain, leading to the choice of unsafe policies. ...

متن کامل

Equilibrium Policy Gradients for Spatiotemporal Planning

In spatiotemporal planning, agents choose actions at multiple locations in space over some planning horizon to maximize their utility and satisfy various constraints. In forestry planning, for example, the problem is to choose actions for thousands of locations in the forest each year. The actions at each location could include harvesting trees, treating trees against disease and pests, or doin...

متن کامل

Gradient - based Reinforcement Planning in Policy - Search

We introduce a learning method called “gradient-based reinforcement planning” (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with n...

متن کامل